The Corpus of Early Written Latvian:

نویسنده

  • Everita Andronova
چکیده

The history of written Latvian dates back to the late 16th century, when both Protestantism and Catholicism reigned. Although the first physically available book in Latvian – Catechismvs Catholicorum – was published not in Latvia, but in Vilnius, the capital of neighbouring Catholic Lithuanian, in 1585, texts in Latvian and copies thereof were distributed in Riga much earlier. Martin Luther’s ideas on preaching in the native language became very popular here and there is written evidence of the first book in Latvian published in 1525, but it has not survived. Research on the history of written Latvian has been carried out rather fragmentary. The delayed development of this branch of the Baltic philology might be explained by the view expressed by Jānis Endzelīns, one of the most influential and well-known Latvian linguists, that the earliest texts were “written incorrectly (by Germans!)” and that the language in the texts is “full of mistakes” (Endzelīns, 1951: 22, 20). Another prominent linguist, Artūrs Ozols, stated that early written Latvian “is a distortion of the people’s language, a grouping of the words of this language according to the model of the German language” (Ozols, 1965: 8). These statements influenced the study of the Latvian language in the first written texts until the early 1980s. This also resulted in a situation where the research on the Early Latvian texts for a long time focused on describing mistakes in separate sources, and in only a few cases some attempts were made to see the reflection of the language system of the time through the mistakes and erroneous and sometimes obscure spelling. The Corpus of Early Written Latvian named SENIE (www.ailab.lv/SENIE) is an effort to change the existing statements and to support a completely new view on the language as a system in these texts. The Corpus was first launched in January 2003, but its development is still in progress (the approximate size of the corpus is now about one million running words). The aim of the Corpus is to facilitate diachronic studies of Latvian, to support variant and language standardization studies, to serve a basis for a historical dictionary of Latvian, as well as to popularise early written sources and to support their re-evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Opinion Mining in Latvian Text Using Semantic Polarity Analysis and Machine Learning Approach

In this paper we demonstrate approaches for opinion mining in Latvian text. Authors have applied, combined and extended results of several previous studies and public resources to perform opinion mining in Latvian text using two approaches, namely, semantic polarity analysis and machine learning. One of the most significant constraints that make application of opinion mining for written content...

متن کامل

Lithuanian-Latvian-Lithuanian Parallel Corpus

The goal of the paper is to present different problems related to the building of Parallel Corpus for two small languages, namely, Latvian and Lithuanian. The Lithuanian-Latvian-Lithuania Parallel Corpus (LILA) will contain 8 million running words; will be bidirectional, aligned on the sentence level. The problems include identifying, acquiring, preparing, and aligning parallel texts.

متن کامل

A Corpus-based Analysis of Epistemic Stance Adverbs in Essays Written by Native English Speakers and Iranian EFL Learners

Academic essays entail taking a stance on the truth value of propositions. Epistemic adverbs deal with the speaker's assessment of the truth value of propositions. Employing a corpus-based approach with descriptive statistics and qualitative description, this study explored the use of epistemic stance adverbs in academic essays written by native English speakers and Iranian EFL learners. Follow...

متن کامل

Toward a Comparable Corpus of Latvian, Russian and English Tweets

Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes including training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that...

متن کامل

Designing the Latvian Speech Recognition Corpus

In this paper the authors present the first Latvian speech corpus designed specifically for speech recognition purposes. The paper outlines the decisions made in the corpus designing process through analysis of related work on speech corpora creation for different languages. The authors provide also guidelines that were used for the creation of the Latvian speech recognition corpus. The corpus ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007